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ABSTRACT 

We present a detailed analysis of four of the most widely used radio source find- 
ing packages in radio astronomy, and a program being developed for the Australian 
Square Kilometer Array Pathfinder (ASKAP) telescope. The four packages; SExtrac- 
tor, SFIND, IMSAD and Selavy are shown to produce source catalogues with high com- 
pleteness and reliability. In this paper we analyse the small fraction (~ 1%) of cases 
in which these packages do not perform well. This small fraction of sources will be of 
concern for the next generation of radio surveys which will produce many thousands 
of sources on a daily basis, in particular for blind radio transients surveys. From our 
analysis we identify the ways in which the underlying source finding algorithms fail. 
We demonstrate a new source finding algorithm Aegean, based on the application 
of a Laplacian kernel, which can avoid these problems and can produce complete and 
reliable source catalogues for the next generation of radio surveys. 
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1 INTRODUCTION 

Source finding in radio astronomy is the process of finding 
and characterising objects in radio images. The properties 
of these objects are then extracted from the image to form 
a survey catalogue. The aim of large scale radio imaging 
surveys is to provide an unbiased census of the radio sky, 
and hence the ideal source finder is both complete (finds all 
sources present in the image) and reliable (all sources found 
and extracted are real). 

Most of the standard source finding algorithms that 
have been developed over the last few decades are highly 
complete and reliable, missing only a small fraction of 
sources. These problem cases are generally dealt with in pre 
or post-processing, or manually corrected in the source cat- 
alogue. 

Next generation radio surv eys such as the E volution- 
ary Map of the Universe (EMU: lNorris et al.|[2oTll 'l and the 
ASKAP Survey for Va riables and Slow Transients (VAST; 
IChatteriee et al] I201C ) planned for the A ustralian SKA 
Pathfinder (ASKAP; Ijohnston eT^ 120081 ') Telescope will 
produce large area, sensitive maps of the sky at high ca- 
dence, resulting in many times more data than previous sur- 
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veys. Data processing will need to be fully automated, with 
limited scope for manual intervention and correction. Hence 
the small number of missing or incorrectly identified sources 
produced by current source-finders will pose a substantial 
problem. In particular, in blind surveys for radio transients, 
missed sources and false positives in an epoch will cause the 
transient detection algorithms to trigger on false 'events'. 
VAST will need to extract thousands of sources from sur- 
vey images at a cadence of ~ 5 seconds. A source finding 
algorithm which is 99% complete and reliable at a signal-to- 
noise ratio of 5, will be producing ~ 10, 000 false sources, 
and missing ~ 10, 000 real sources per day. Whilst it is pos- 
sible to remove false sources from a catalogue, the missing 
real sources are lost forever. The large data rates of tele- 
scopes like ASKAP will make it impossible to store each 
observation, and thus no reprocessing of the data will be 
possible. 

The way in which a source finding algorithm fails to 
detect a real source is often assumed to be related to noise, 
and that it is random. In this paper we test this assump- 
tion and show that whilst many sources are missed due to 
random noise related effects, there is also a component that 
is deterministic and related to the underlying algorithm. By 
analysing the source finding algorithms and their modes of 
failure we identify ways in which the algorithms could be im- 



2 Hancock et al. 



proved and use this knowledge to build an algorithm which 
can produce catalogues that are more complete and more 
reliable than those currently available. 

In this paper we discuss the problem of source-finding 
for Stokes I continuum radio emission in the context of next 
generation radio imaging surveys. We will not deal directly 
with the additional complications introduced by spectral 
line data, polarization data or extended sources and diffuse 
emission. The focus of this paper is on upcoming surveys 
for ASKAP, however the results will be equally applicable 
to future radio surveys on other SKA path finder instru- 
ments such as the M WA (jLonsdale et al.ll2009l 'l and SKAMP 
l|Adams et al.ll2004l ). and of course the SKA itself. 

In §[2] we outline the main approaches to source-finding 
in radio astronomy, and then in § |3] describe four of most 
widely used source finding packages. § 13.71 gives some ex- 
amples of the way that source finding packages are used 
to create catalogues for large surveys and transient studies. 
§ [4] describes the test data that was used in the analysis of 
the source finding algorithms, and § [5] describes the eval- 
uation process. The instances in which the source finding 
algorithms fail to find or properly characterise sources are 
described in §[6l §[7]describes a new source finding algorithm, 
Aegean, which has been designed to overcome many of the 
problems suffered by existing source finding packages. We 
summarise our conclusion in §(8] 



2 SOURCE-FINDING IN RADIO 
ASTRONOMY 

In a broad sense, source finding in radio images involves find- 
ing pixels that contain information about an astronomical 
source. Most approaches to source finding in radio astron- 
omy follow a similar method: (i) background estimation and 
subtraction; (ii) source identification; (iii) source character- 
isation; and (iv) cataloguing. In this section we outline the 
standard method taken in each of these steps. 

In the discussion that follows we consider a source to be 
a signal of astronomical importance that can be well mod- 
eled by an elliptical Gaussian. By this definition a radio 
galaxy with a typical core/jet morphology would be made 
up of three sources, one for the jet and each of the lobes. The 
grouping of multiple sources into a single object of interest 
(like a core/jet radio galaxy) is not in the realm of source 
finding or classification as it relies on contextual information 
to make such an association. 



2.1 Background estimation and subtraction 

The first step in source finding is determining which parts of 
the image b elong to sources an d which belong to the back- 
ground fee. iHuvnh et alll201lh . The most common way in 
which this separation is achieved is to set a fiux threshold 
that divides pixels in to background or source pixels. This 
process is referred to as thresholding. 

A straightforward case would involve a background that 
is dominated by thermal noise, which is without structure 
and is constant across the entire image. In such a case a 
single threshold value can be chosen that will result in all 
sources above that threshold being detected, and some small 
number of false detections. A varying background can be 



accounted for by calculating the mean and rms noise in lo- 
cal sub-regions, which is then used to normalize the image 
before applying a uniform threshold in signal-to-noise. The 
selection of a threshold limit is often a balance between de- 
tecting as many real sources as possible and minimising the 
number of false detections. Typically a 5(7 threshold limit is 
used in a blind survey, with higher or lower limits chosen for 

larger or smaller regions of sky. 

False detection rate (FDR) analysis (|Hopkins et al.l 

determines the threshold limit that will result in a 
number of falsely detected pixels that is lower than some 
user defined limit. 

In cases where the background has structure, an im- 
age filter must be used to remove the background structure 
before the source-finding stage. The way in which the back- 
ground structure is removed depends on the cause and type 
of structure that is present. A common example is diffuse 
emission in the galactic plane, with compact sources embed- 
ded within. A discussion of background filtering techniques 
is beyond the scope of this paper, and in our analysis we 
assume the images have been pre-processed and are free of 
background structure. For an ev aluation of background es- 
timation see iHuvnh et all (|201ll '). 



2.2 Source Identification 

Source identification is the process by which pixels that 
are above a given threshold are grouped into contiguous 
groups called islands. Each island corresponds to one or 
more sources of interest. The process of finding sources is 
complete at this stage. The format of the catalogue is just a 
list of pixels that belong to each of the islands, which is not 
of general astronomical utility. Source characterisation is re- 
quired to convert these islands of pixels into a more useful 
form. 



2.3 Source Characterisation 

Source characterisation involves measuring the properties of 
each source, for example the total fiux and angular size. 
The best source characterisation method is strongly depen- 
dent on the nature of the sources that are to be studied. 
Point sources, by definition, have the shape of the point- 
spread-function (PSF) of an image, making the PSF shape 
important in the characterisation process. Images that are 
produced from radio synthesis observations have been de- 
convolved and the complicated PSF of the instrument has 
been replaced with an appropriately scaled Gaussian. Ob- 
servations with sufficient u, v coverage will do not need to 
be deconvolved as they have a PSF that is already nearly 
a Gaussian. In either case, compact sources will appear as 
Gaussian, and so an island of pixels can be characterised by 
a set of Gaussian components. 

In lower resol ution radio surveys such as the 



NVSS (45arcsec^ ICondon et al.lll9 9S) and SUMSS (45 x 



45cosec|5| arcsec^ . iMauch et al.ll2003 ). a majority of objects 
are unresolved and can be characterised by a single Gaus- 
sian. How ever, in higher reso lution surveys such as FIRST 
(5 arcsec^ , iBecker et al.l Il995h a significant fraction of the 
sources are partially extended or have multiple components, 
and so multiple Gaussians are required to represent them. 
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Fitting a number of Gau ssian compone nts to an island 
of pixels is straightforward (|Condonlll997l l. but is highly 
sensitive to the choice of initial parameters. Gaussian fitting 
can converge to unrealistic or non-optimal parameters due to 
the many local minima in the difference function. Effective 
multiple Gaussian fitting requires two things: an intelligent 
estimate of the starting parameters, and sensible constraints 
on these parameters. None of the widely used source-finding 
packages have an algorithm for robustly estimating initial 
parameters for a multiple Gaussian fit. 

Two approaches have been developed which try to ad- 
dress the difficulty of obtaining accurate initial parameters 
for multiple Gaussian fitting: de-blending, and iterative fit- 
ting. A de-blending based approach breaks an island into 
multiple sub- islands, each of which is fit with a single compo- 
nent. In an iterative fitting approach, the difference between 
the image data and the fitted model (the fitting residual) is 
evaluated in order to determine weather an extra component 
is required. This analysis will repeat until an acceptable fit is 
achieved, or a limit on the number of components has been 
reached. De-blending and iterative fitting are both suscep- 
tible to source fragmentation, whereby a single true source 
is erroneously represented by multiple components. 

Once each island of pixels has been characterised the 
fitting parameters are catalogued. 



2.4 Cataloguing 

The final stage in source finding is extracting the source 
parameters and forming a catalogue of objects in the field. 

A catalogue should contain an appropriate listing of ev- 
ery parameter that was fit, along with the associated un- 
certainties. In addition to the fitted parameters, a source 
finding algorithm should report instances where the source 
characterisation stage was inadequate or failed. By report- 
ing sources that were not well fit, a catalogue can remain 
complete despite having measured some source parameters 
incorrectly. Poorly fit sources can easily be re-measured, 
whereas excluded sources are missed forever. If it is not pos- 
sible to construct a reasonable facsimile of the true sky using 
only the information provided in the source catalogue then 
the source finding process has not been successful. 



3 SOURCE-FINDING PACKAGES AND 
THEIR ALGORITHMS 

Most of the major source- finding packages in astronomy are 
based on a few common algorithms. In this section we outline 
the features of these packages. 

Source finding packages that rely on wavelet analysis 
were not considered in this work as none of the most widely 
used source finding packages rely on wavelet analysis. 

3.1 SExtractor 



The source finding and characterisation process that 
SExtractor follows can be modified via an extensive param- 
eter file. For this work the following parameters were used: 



SExtractor (SE; ' Bertin Sz Arnout^l 19961 ) was developed for 
use on optical images from scanned plates. The speed and 
ease of use of SExtractor has made it a popular choice for 
radio astronomy despite its optical astronomy origins. SEx- 
tractor is a stand alone package for Unix-like operating sys- 
tems. 



DETECT.MINAREA 
THRESH.TYPE 
DETECT.THRESH 
ANALYSIS.THRESH 

MASK.TYPE 

BACK^IZE 

BACK_FILTERSIZE 



ABSOLUTE 

125e-6 

75e-6 

CORRECT 

400 

3 



The first four parameters instruct SExtractor to de- 
tect all sources with a peak pixel brighter than 5a = 
125/iJy/beam. The source characterisation is then carried 
out on islands of pixels that are brighter than 3a = 
75/^ Jy /beam that contain at least 5 pixels. The final three 
parameters ensure that the measured fiux of a source is cor- 
rected for the effects of nearby sources, and that the back- 
ground is estimated using a box of 3 x 400 pixels on a side. 
This large background size results in a background that is 
less than lfi3y for each of the tested images. The param- 
eters DEBLENDJ^THRESH and DEBLENDJvIINCONT 
are used by SE in the source characterisation stage, when 
deciding how many components are contained within an is- 
land of pixels. The ability of SE to characterize sources was 
found to be insensitive to these parameters for the simulated 
images used in this work. 



3.2 IMSAD 

Image Search and Destroy (i msad) is an image based source 
finding algorithm in miriad (|Sault et al ] |l995h . The thresh- 
old is user-specified either as an absolute fiux level or as a 
signal to noise ratio (SNR) with the background noise de- 
termined from a histogram of pixel values. Only pixels that 
are brighter than the threshold are used in the fitting pro- 
cess. For the analysis presented in this work we specify a 
threshold of 5a — 125/.tJy/beam. imsad performs a single 
Gaussian fit to each island of pixels. 



3.3 Selavy 

Selavy is the source finding package that is being developed 
by ASKAPsoft as part of the data processing pipeline for 
ASKAP. Selavy is a source finding package that is able 
to work with spectral cubes and continuum images, and 
includes a number of different algorithms and approaches 
to source finding. Selavy is related to the publicly avail- 
able Duchamp software (Whiting, 2012, MNRAS, in presfd). 
Selavy is a version of the Duchamp software that has been in- 
tegrated into the ASKAPsoft architecture to run on a highly 
parallel system with distributed resources. In the context of 
compact continuum source finding the only difference be- 
tween Selavy and Duchamp is that Selavy is able to param- 
eterize and island of pixels with multiple Gaussian compo- 
nents. Selavy was given a threshold of 5a = 125/^Jy/beam 
for source detection. 



^ http://www.atnf.csiro.au/people/Matthew.Whiting/Duchamp 
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3.4 SFIND 



SFIND (jHopkins et al.|[20o3 ) is implemented in MIRIAD and 
uses FDR analysis to set the detection threshold. Source 
characterisation is performed using the same Gaussian fit- 
ting subroutine as that use by the MIRIAD task imf it. 

A varying background is calculated by SFIND by dividing 
the image into sub-regions of (user defined) size, and mea- 
suring the mean and rms of each region. In an image which 
contains a high density of sources, or sub-regions which con- 
tain a particularly bright or extended source, the calculated 
mean and rms will be contaminated by the sources. For sub- 
regions where this occurs the result is a mean and rms value 
that is significantly different from the adjacent sub-regions, 
which can cause sources on the boundaries to be normalised 
such that their shape and flux distribution are not preserved. 
For an image which is constructed to have a zero mean and 
constant rms, these contamination effects can be largely re- 
moved by setting the size of the sub-regions to be larger 
than the given image. 

The rejection of sources which fail to be fit with a Gaus- 
sian rejects many instances of sources that have very few 
pixels. This has the effect of further decreasing the false de- 
tection rate for the catalogue, since a false positive source 
needs to have many contiguous false positive pixels in order 
to be fit properly. 

In this work we selected the sub-regions to be larger 
than the given image, and adjusted the FDR parameter 
until the automatically selected threshold was at 5(t = 
125/iJy/beam. 



3.5 FloodFill 

FloodFill is an algorithm which performs the second stage 
of source finding, separating the foreground from the back- 
ground pixels, and grouping them into islands that are then 
passed on to the source characterisation stage. We describe 
FloodFill as implemented in the new source finding algo- 
rithm, Aegean, whi ch is described in SJT] Although used by 
iMurphv et all (|2007l ). the details of the algorithm have not 
been described in the astr onomical literature (although see 
iRoerdink fc Meiisteij[200ll ). 

FloodFill takes an image and two thresholds (cts and 
cr/, with as ^ o"/). Pixels that are above the seed threshold 
Gs are used to seed an island, whilst pixels that are above 
the flood threshold o"/ are used to grow an island. Given a 
single pixel above CTs, FloodFill considers all the adjacent 
pixels. Adjacent pixels that are above a/ are added to the 
island and pixels adjacent to these are then considered. This 
iterative process is continued until all adjacent pixels have 
been considered. The operation of FloodFill is demonstrated 
on a simplistic 'image' in Figure [T] In panel A the brightest 
pixel in the image has been chosen to seed the island, and 
is coloured yellow. The adjacent pixels are coloured cyan. In 
panel B the pixels that are adjacent to the seeding pixel are 
added to the island as they are brighter than cr/ =4. Pixels 
adjacent to the island are now considered. The process is 
repeated in panel C. In panel D some of the adjacent pixels 
are now below cr/ and are thus not added to the island, 
and are flagged as background pixels. In panel E there are 
no longer any pixels adjacent to the island which have not 
been rejected so the search for new pixels halts. In panel 
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Figure 1. A demonstration of the operation of the FloodFill 
algorithm. The small 'image' has pixel values that are in units 
of the background noise. The seed threshold is CTs = 5 and the 
flood threshold cr/ = 4. The orange pixels are those that have 
been assigned to an island, green pixels are those that are being 
considered, gray pixels have been rejected and white pixels are 
yet to be considered. In panel A the brightest pixels is used to 
seed an island. Pixels adjacent to the island are then inspected 
(coloured green). In panel B the pixels under consideration are 
brighter than the flooding threshold and are added to the island. 
The process is repeated in panels C and D. In panel E there are no 
more pixels adjacent to the island that have not been inspected. If 
panel F all pixels have either been assigned to an island (orange) 
or are labeled as background (gray). 



E there are no longer any pixels above the seeding limit of 
cr J, = 5 so all remaining pixels are flagged as background 
pixels (panel F). 

The operation of FloodFill is invariant to changes in 
the order in which the seed pixels are chosen. The output of 
FloodFill is a disjoint list of islands, each of which contains 
contiguous pixels that are above the cr/ limit. FloodFill does 
not perform any source characterisation, although it is able 
to report the flux of an island of pixels by summing the pixel 
intensities. The fluxes that are reported by FloodFill have a 
positive bias which can be corrected as described by Hales 
et al. (2011, in prep). 



3.6 Aegean 

FloodFill forms the basis for two new source flnding al- 
gorithms: BLOBCAT (Hales et al. 2011, in prep) and 
Aegean. Both algorithms begin with a set of islands iden- 
tified by FloodFill but characterise these islands differently. 
The BLOBCAT program characterises each island of pix- 
els without assuming a particular source structure, where 
as Aegean assumes a compact source structure in order to 
fit multiple components to each island. Here we outline the 
Aegean algorithm, with a detailed description differed to 

m 

The Aegean algorithm has been implemented in 
Python and uses FloodFill to create a list of islands of pix- 
els. Aegean was set to use a single background threshold of 
5cr to seed the islands, and a flood threshold of 4cr to grow 
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the islands. This threshold was set to 5a = 125/iJy/beam so 
that we are detecting sources above an SNR of 5. 

Aegean uses a curvature map to decide how many 
Gaussian components should be fit to each island of pixels, 
and the initial parameters for each component. A Python 
implementation of the MPFit librarjQis used to fit the Gaus- 
sian components with appropriate constraints. Each island 
of pixels is thus characterised by at least one Gaussian com- 
ponent. 



3.7 Source finding in radio surveys 

The process of creating a catalogue of sources from survey 
images involves more than running one of the source find- 
ing packages described in § (3] Since the observing strategy, 
hardware and data reduction techniques can vary widely 
between surveys, standard source finding packages are typi- 
cally used only as a starting point for the creation of a source 
catalogue. The time required to carry out the observations 
for a large area sky survey is typically spread over multiple 
years. This prolonged observing schedule is usually accom- 
panied by multiple iterations of calibration, data reduction 
and source detection, so that by the time the final observa- 
tions are complete it is possible to produce a survey cata- 
logue using a source finding pipeline that has been refined 
over many years. 

The NRAO Very Lar ge Array (VLA) Sky Survey 
fNVSS. ICondon et al.|[l99i ). drew upon observations from 
1993-1996, during which time the AIPS source finding rou- 
tine SAD was modified to create VSAD. The survey strategy 
for the NVSS was devised to give noise and sidelobe charac- 
teristics that were both low, and consistent across the sky. 
The configuration of the VLA was varied across the sky to 
ensure a consistent resolution throughout the survey. The 
survey strategy was thus designed to produce images that 
were nearly uniform across the sky, making the task of source 
finding as easy as possible. The completeness and reliability 
of the NVSS catalogue was improved in the years subsequent 
to the completion of the survey with the final stable release 
in 2002. 

The Sydney U niversity Molonglo Sky Survey (SUMSS, 
iMauch et al] 120031') and Molo ng o Galactic Plane Survey 
fMGPS-2. lMurphv et aLlbOOTl l both used the source finding 
package VSAD, however the single purely East-West con- 
figuration of the telescope meant that the resolution var- 
ied with declination, and the regularly spaced feeds pro- 
duced many image artefacts. The changing resolution and 
image artefacts meant that the source finding algorithm pro- 
duced many false detections. The image artefacts appeared 
as radial spokes or arcs around bright sources. In order 
to rid the source catalogue of falsely detect ed sources, a 
machine learn ing algorithm was implemented (|Mauch et al.l 
I2OO3I : iMurphv et al. 2007ty The machine learning algorithm 
was able to discriminate between real and false sources, but 
required substantial training to achieve high completeness 
and reliability. 

An archival transients survey has recently been 
completed using the data from the SUMSS survey 



^ code. google. com/p/agpy/source/browsc/trunk/mpfit/mpfit.py 
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ISannister et al.ll201ll ). In an archival search spanning 20 
years of observations, the need for a fast source detection 
pipeline is not important, as fast transients will not be de- 
tected, and s low transients w i ll rem ain visible for years to 
come. In the iBannister et al.l (|201ll ) survey, regions of sky 
with multiple observations were extracted from the archival 
SUMSS data. These regions of sky were analysed for sources 
which either changed significantly in flux, or which were de- 
tected in only a subset of the images. The SFIND package 
was used to detect sources, which were then remeasured 
using the miriad routine imfit. A complication that was 
encountered in the analysis of the SUMSS data, was the 
contamination of candidate source lists due to source find- 
ing errors. False positive detections and missed real sources 
both appear as sources which are only detected in a subset 
of all the images, and thus appear to be transient sources. 
The light curve of each transient event therefore needed to 
be double checked in order to remove such occurrences. 

A similar transient detection project was carried out 
with new observations from the Allen Telescope A r ray, i n 
the ATA Transients Survey (ATATS, ICroft et all bOlll l. 
Side-lobe contamination in the ATA imag es is much lower 
than that in the SUMSS images used in the lBannister et al.l 
(|201ll ) study, however falsely detected sources in the indi- 
vidual images still resulted in false transient detections and 
required further processing to remove. 

The process of finding sources and creating a catalogue 
extends beyond the operation of a source finding package 
and has previously required substantial manual intervention. 
The next generation of telescopes, particularly the dedicated 
survey instruments, will be able to complete observations 
on a much shorter time scale than current generation tele- 
scopes, and thus the time spent creating the refining the 
source catalogue will become a larger fraction of the total 
effective survey time. Source finding packages that are able 
to produce more accurate, complete and reliable catalogues 
will provide a better starting point for the final version of 
the survey catalogue. 



4 TEST DATA 

We used a simulated data set to evaluate the source-finding 
algorithms described in § [3] A simulated data set has the 
advantage that we are able to control the image properties 
(such as rms noise) and that we know the input catalogue. 

Matching recovered sources with a true list of expected 
sources is an important part of the analysis presented in this 
paper. With any real data set, the list of expected sources 
comes with some degree of uncertainty, in that these lists 
are recovered from incomplete and noisy reconstructions of 
the radio sky. To avoid such uncertainties we generated a 
master catalogue of sources, which was then used to create 
a simulated image of the sky. With absolute control over 
the input catalogue and image characteristics, we are able 
to make more definitive statements about the quality of the 
catalogues that are produced. 

The master source catalogue was generated with the 
following constraints: 

(i) Fluxes: The source peak flux are distributed as 
N{S) oc S~^-^, and within the range (25 fiJy, 10 Jy). 
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(ii) Positions: Sources were randomly distributed in space 
within one of ten regions of sky similar to that in Figure [2] 
Source clustering was not considered. 

(iii) Morphologies: The major and minor axes of each 
source were randomly distributed in the range — 52" with 
position angles in the range (— 90deg, +90deg). 

A simulated sky image was created, which contained 
each of the sources in the input catalogue. The image has 
a 30" sythesised beam, and a 25/iJy/beam rms Gaussian 
background noise. The sources were injected with a peak 
flux and morphology as listed in the catalogue. The size of 
the image is 4801x4801 pixels with a scale of 6" per pixel, 
resulting in a synthesised beam sampling of 5 pixels per 
beam. Regions of sky exterior to the catalogue contain noise 
but no sources. 

The simulated data-set can be found online at 
www .physics .usyd. edu. au/'hancock/ simulations 



5 SOURCE-FINDING EVALUATION 

The source finding packages described in § [3] were used to 
generate a catalogue of sources from the simulated images. 
Each source finding package was run with a 5(t threshold. In 
the case of SFIND, the FDR was chosen so that the resulting 
threshold was equal to 5(t. 

The source finding algorithms were evaluated by com- 
paring these catalogues with the input source catalogue. 
Three standard metrics that have been used in the compar- 
ison of catalogues, and hence source finders, are the com- 
pleteness, reliability, and fiux distribution, as defined and 
discussed in 15. 21- [5. 51 below. 



Source Counts 



5.1 Cross matching of catalogues 



Much of the analysis that will be discussed in ii l5.2l5.5l relies 
on the cross-identification of sources from two catalogues. 
A common criterion for accepting cross-identifications be- 
tween catalogues is to choose the association with the small- 
est sky separation, up to a maximum matching radius. 
To decrease the chances of false associations we also con- 
sider the flux of the source when choosing between multiple 
matches within a matching radius of 30". The distance in 
phase space, D, is given by: 



\og(D) = 



+ 



(1) 



where (a, 5) are (RA,DEC), S is the flux, and ctq = ct^ = 30" 
is the size of the convolving beam, and as ~ 25/iJy/beam 
is the image rms noise. 

5.2 Flux Distribution 

The analysis of the fiux distribution of a catalogue does 
not require catalogues to be cross-matched. Since the in- 
put source catalogue was constructed with a particular fiux 
distribution, we should expect to see this distribution repli- 
cated in the output catalogues. Figure [3] shows the fiux dis- 
tribution for each of the source finders compared to the input 
distribution. Except for Selavy, each of the catalogues have a 
fiux distribution that is consistent with the input catalogue. 




100 

Measured SNR 



Figure 3. The source count distribution of the catalogued 
sources. The true source count distribution is shown as a dotted 
line. Selavy consistently reports twice the true number of sources. 



An excess of sources can be a sign of spurious detections, 
whilst a lack of sources can be due to incompleteness. If a 
source finding algorithm has a fiux distribution that devi- 
ates from the ideal case, it indicates that something is wrong 
however the cause of the problem cannot be identified from 
this graph alone. Selavy has around double the number of 
sources at all fiux levels as it suffers from source fragmen- 
tation. Since the fragmented components are close to the 
true position of the original source, the completeness and 
reliability of Selavy are only partly compromised. 



5.3 Completeness 

The completeness of a catalogue at a fiux So is often defined 
as the fraction of real sources with true fiux S ^ So that are 
contained within the catalogue. In practice the completeness 
is measured as the number of sources with a measured fiux 
S ^ So that are contained within the catalogue. The two 
measures are comparable at large SNR, but when the SNR 
is ~ 5 the fiux of a source can be in error by ~ 20%. The 
completeness relative to the measured source fiuxes is also 
effected by Eddingto^ (|l913l ) bias, whereas the completeness 
relative to the true source fiuxes is not. 

The completeness of a source finder was determined by 
matching the simulated catalogue with each of the source 
finding catalogues. The completeness of a catalogue at a fiux 
So is then the fraction of real sources of fiux greater than 
So which are contained within the given catalogue. Figure [4] 
shows completeness as a function of injected SNR for each 
of the source finders. Plotted alongside each of the com- 
pleteness curves is a theoretical expectation of completeness 
for comparison. The expected completeness has been deter- 
mined by taking each of the sources in the input catalogue 
and calculating the probability that it will be seen at a par- 
ticular flux level, given the known rms in the image. The 
expected completeness is calculated as 
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Figure 2. Simulated image of the sky. The black line delineates the region of sky containing the injected sources. The colour bar ranges 
from — 5o" to 33a. 



C{So) 



J:sP{S>So) 
Es>sn^(^) 



P{S > So) = - 



4 log 2 



s'-s' 



(2) 



dS' (3) 



= lErfc f V^^(So-S) 
2 V a 

where P{S > So) is the probability that a source of 
flux 5* will be seen at a flux greater than 5o after noise has 
been included, and N{S)dS is the number of sources with 



flux between S and S + dS. Erfc is the complementary error 
function. At SNR > 6 all of the source finding packages pro- 
duce catalogues that are greater than 99% complete. The 
high completeness is due to, and responsible for, the wide- 
spread use of the given source finders. The different levels 
of completeness shown in Figure |4] is a direct result of the 
source finding algorithms implemented by each of the pack- 
ages. The performance of SFIND is comparable to the other 
source finders above an SNR of 7, but less complete below 
this SNR. The lower completeness is a result of sfind 's fo- 
cus on minimising the false detection rate, as is shown in 
Figure [5] Selavy and Aegean are the most complete source 
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Figure 4. The completeness of each catalogue as compared to the 
input catalogue. The coloured curves represent the completeness 
of the named source finders. The black dotted curve represents 
the expected completeness of an ideal source finder as calculated 
by Equations I2I3I 

finding packages at all SNRs, however Selavy achieves this at 
a cost of an increased false detection rate (see ij I5.4l and Fig- 
ure|4]). Aegean is able to achieve high completeness and low 
false detection rate at all SNRs. The completeness of each 
of the source finding packages is summarised in Table [T] 

5.4 False Detection Rate 

The false detection rate (FDR) of a source finding package 
at a fiux So, is defined as the fraction of catalogued sources 
with S ^ So which are not identified with a real source. The 
FDR of a source finder was determined by matching the re- 
sulting catalogue with the simulated source list. Catalogued 
sources which are not within 30" of a true sources are con- 
sidered false detections. The false detection rate of a source 
finding algorithm is related to the commonly used metric of 
reliability by: 

FDR + Rehability = 100% (4) 

In Figure [S] the FDR is plotted as a function of SNR for 
each of the source finding packages. Substituting a flux of 
S = ujy into equation |3] and considering the area of sky 
covered by the simulated images we expect that there is less 
than one false detection due to random chance. Thus an 
ideal source finding algorithm should have an FDR of zero. 
The > 1% FDR peaks shown in Figure [S] (especially for 
IMS ad) are due to islands of multiple sources that have not 
been properly characterised. SExtractor, imsad and Selavy 
have a higher false detection rate than sfind and Aegean, 
as the former are not able to accurately characterise islands 
of pixels. 

The single sources that are fragmented into multiple 
sources by Selavy often have positions that are close enough 
to the true position that they are not considered false detec- 
tions, and thus don't significantly impact the false detection 
rate. However at low SNRs, Selavy breaks single sources into 
three or even four components, and one or more of these 



Figure 5. The false detection rate (FDR) for each of the source 
finding algorithms. The FDR is entirely a function of the source 
finding algorithm. No falsely detected sources are expected above 
an SNR of 5 for the area of sky simulated. 



Package 


Completeness (%) 


Reliability (%) 






lOo- 


50(T 


5o- 


lOcr 


50cr 


IMSAD 


93.44 


99.50 


99.49 


97.17 


97.75 


99.66 


Selavy 


91.20 


99.92 


99.87 


96.96 


97.65 


99.93 


SE 


88.31 


99.77 


99.62 


98.68 


99.11 


100.0 


SFIND 


82.48 


99.81 


99.75 


96.09 


99.79 


100.0 


Aegean 


93.87 


99.91 


99.87 


98.69 


100.0 


100.0 


Ideal 


94.51 


100.0 


100.0 


100.0 


100.0 


100.0 



Table 1. The completeness and reliability of each of the source 
finding algorithms. The 'Ideal' case has been included for com- 
parison. 



components have a position distant enough from the true 
source that they are registered as false detections. This is 
evident in Figure [S] 

IMSAD suffers from the reverse problem to Selavy, in 
that it will never break islands into multiple components 
even when they contain multiple sources. The position that 
is reported by IMSAD in such situations can be sufficiently 
far from the true position that these islands are registered as 
false detections. All of the false detections for IMSAD above 
an SNR of 20 in Figure [5] are due to this flaw. 

The reliability of each of the source finding packages is 
summarised in Table [T] 

5.5 Measured parameter correctness 

For all measured catalogue sources that were identified with 
a true source it is possible to compare the measured param- 
eters to the known true values. 

Figure[n]shows the median absolute deviation (MAD) in 
position, as a function of SNR, for each of the source find- 
ing algorithms. The MAD is calculated for each SNR bin 
and is not a cumulative measure. An ideal source finding 
algorithm will have a typical error in position that is pro- 
portional to C/SNR^ where C is a constant that depends 
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Figure 6. The accuracy with which each of the source finding 
packages determines the position of sources. The dotted gray 
curve is the expected accuracy for an ideal elliptical Gaussian 
fit. 
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Figure 7. The accuracy with which each of the source finding 
packages measures the flux of sources as a function of the reported 
flux. The dotted gray line is the expected accuracy for an ideal 
elliptical Gaussian fit. 



on t he morphology of the source and the convolving beam 
(see ICondonI (| 19971 ) for detailed a calculation). The MAD 
in position of an ideal source finding algorithm is calculated 
semi-analytically by assuming that each source in the input 
catalog has measurement error of C/SNR^ This ideal curve 
is plotted in Figure (6] 

As is expected, the accuracy with which a source posi- 
tion can be measured increases with flux, and is in agreement 
with the performance of an ideal Gaussian fitting routine, 
which is shown as a dotted curve in Figure [6] The devia- 
tions from ideal behaviour that can be seen in Figure [6] for 
the various source finders at high SNR are artifacts of the 
reporting accuracy of the packages. For example, imsad re- 
ports positions to a resolution of 0.1" and therefore cannot 
achieve a median absolute deviation in position better than 
~ 0.1". SFIND has similar problems at an SNR of > 3000. 
The median absolute position deviation for Aegean and 
Selavy will also deviate from ideal, but at an SNR in excess 
of the 40, 000 reported in Figure (6) SExtractor does not use 
Gaussian fitting to characterise source positions and there- 
fore does not perform as well as the ideal at SNR greater 
than 50. At an SNR of < 100 Selavy has a median abso- 
lute position deviation that is higher than the ideal. This is 
because of source fragmentation. 

Figure [7] shows the MAD in fiux as a fraction of total 
fiux, as a function of SNR. Again the ideal behavior of a 
Gaussian fitter has been shown by a dashed curve. Overall 
the source finding packages report fluxes that are consistent 
with the expected ideal Gaussian flt, the exceptions being 
SExtractor above an SNR of 50, and Selavy at an SNR below 
50. SExtractor deviates from the ideal and has a plateau at 
1% flux accuracy. In this work we use the corrected isophotal 
fluxes (FLUXJSOCOR) from SExtractor. Of ah the meth- 
ods that are available for measuring fluxes in radio synthesis 
images, the corrected isophotal fluxes was found to be the 
most accurate. Selavy deviates from the ideal case and has 
a flux accuracy of about 1/2 of ideal. The cause of this devi- 
ation is source fragmentation in which each component has 
only a fraction of the total true flux (see §[5]2]and Figure |3]). 



5.6 Initial evaluation summary 

Each of the source finding algorithms conforms to a high 
standard of completeness and reliability, and is able to pro- 
duce a robust catalogue of a statistically large number of 
sources, with accurate measurements of position and flux. 
The completeness of the Aegean source finding package is 
as good or better than any of the other packages, and has 
been achieved without sacrificing reliability. In the context 
of next generation radio surveys, we are interested in the 
small differences between each of these source finders, and 
in how to optimize the approach to avoid even the residual 
small level of incom pleteness and false detection rate. In sur- 
veys such as EMU (|Norris et al.ll201ll '). with an expected 70 
million sources, an FDR of even 1% translates into 700, 000 
false sources. This clearly has an impact on the study of 
rare or unusual behaviour. In particular we are interested 
in how the source finding algorithm affects the final output 
catalogue at a level that is far more detailed than previously 
explored. With this in mind we now delve into specific cases 
in which existing source finding packages fail. 



6 MISSED SOURCES 

We are now at the stage where we can consider the real 
sources that were missed by the source finding packages, as 
well as the false detections that these programs generate. 
There are two populations of sources that are missed by one 
or more of the source finding packages as will be discussed 
in § 16.1116.21 below. 

6.1 Isolated faint sources 

In the simulated image, for which no clustering was taken 
into account, 99.5% of the islands contained a single source. 

The first population of sources that was not well de- 
tected by the source finding algorithms are isolated faint 
sources. These sources have a true flux greater than the 
threshold, but have few or no pixels above the threshold 
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due to the addition of noise, sfind and SExtractor require 
an island to have more than some minimum number of pixels 
for it to be considered a candidate source, imsad, Selavy and 
Aegean have no such requirement. The number of sources 
that are not seen in a catalogue due to the effects of noise can 
be calculated directly and is essentially the inverse problem 
to that of false detections. A correction can be applied to any 
statistical measure extracted from the catalogue in order to 
account for these missed sources. The only way to recover 
all sources with a true flux above a given limit, is to have 
a threshold that is well below this limit, either by produc- 
ing a more sensitive image, or by accepting a larger number 
of false detections. Since this noise affected population of 
sources cannot be reduced by an improved source finding 
algorithm, and can be accounted for in a statistically robust 
way, we will consider this population to be non-problematic. 

6.2 Islands with multiple sources 

The second population of sources that is not well detected by 
the source finding packages are the sources that are within 
an island of pixels that contains multiple components. Ex- 
amples of such islands are shown in Figures [S] [9l and 1101 
If a source finding algorithm is unable to correctly charac- 
terise multiple sources within an island, some or all of these 
sources will be missed. There are two approaches used by 
the tested algorithms to extract multiple sources from an 
island of pixels - iterative fitting and de-blending. Each of 
these approaches can fail to characterise an island of sources 
for different reasons, and will now be discussed in detail. 



6.2.1 Iterative fitting 

The first approach to characterising an island of multiple 
components is an iterative one which relies on the notion of 
a fitting residual. The fitting residual is the difference be- 
tween the data and the model fit. In the iterative approach 
a single Gaussian is fit to the island and the fitting residual 
is inspected. If the fitting residual meets some criterion then 
the fit is considered to be 'good' and a single source is re- 
ported. If the residual is 'poor' then the fit is redone with an 
extra component. Once either the fitting residual is found 
to be 'good' or some maximum number of components has 
been fit, the iteration stops and the extracted sources are re- 
ported. A disadvantage of this method is that if the number 
of allowed Gaussians (n) is poorly chosen, islands contain- 
ing single faint sources can have a 'better' fitting residual 
when fit by multiple components, and source fragmentation 
occurs. When a source is fragmented it is difficult to extract 
the overall source parameters from the multiple Gaussians 
that were used in the fitting of the source. In particular the 
source flux is not simply the sum of the flux of the frag- 
ments. If the chosen value of n is too small then not all 
of the sources within an island will be characterised. These 
uncharacterised sources will contaminate the fitting of the 
previously identified sources resulting in a poor characteri- 
sation of the island. 

When the fiux ratio of components within an island of 
pixels becomes very large, an iterative fitting approach can 
fail. The cause of this failure is related to the performance 
of an ideal Gaussian fitting routine. Figure [7| shows the frac- 
tional error in measuring the amplitude of a Gaussian. For 
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Figure 8. Top Left: A section of the simulated image. Remain- 
der. The fitting residual for each of the source finding algorithms. 
Aegean was the only algorithm to fit all three sources, over both 
islands. 



high SNR sources, the absolute flux error can be orders of 
magnitude below the rms image noise, so it may be expected 
that the maximum fiux in the fitting residual should also be 
at or below the rms image noise. However the main contri- 
bution to the flux seen in the fltting residual is not from 
amplitude errors but from errors in estimating the FWHM 
of the source. 

The amplitude difference between a (ID) Gaussian of 
amplitude A and FHWM of (= 2 a/2 log 2a) and a second 
Gaussian of identical amplitude A and FWHM of 6*' = -f 
A9, is given by F{x): 



F{x) = A 



which has maxima at 



In^ 



(5) 



(6) 



" 21n2 \^6l'^ - 6I2 
As a fraction of the true flux, the maximum residual is then 

The typical error in the measurement of 9 is (|Condonlll997h 
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Figure 9. Top Left: A section of the simulated image. Remain- 
der: The fitting residual for each of the source finding algorithms. 
Aegean and SFIND were able to correctly identify and characterise 
the two components but others were not. 




Figure 10. Left: An island of pixels from the simulated image 
containing both a 9Jy source and a 1.7mjy source. Right: The 
fitting residual formed by subtracting a (Aegean) fitted model of 
the 9Jy source from the data. The pixel scale is — 3(t (white) to 
+5o- (black) with contours at an SNR of ±5, ±50, ±500, ±5000 in 
contrasting tones. The flux of the source and its major axis have 
both been measured to within 0.05% of the true value and yet 
the fitting residual has peaks at an SNR of over 500. 

SO that a source with an SNR of A/ a wih have a fitting 
residual with an SNR of 

From the Equation [9] it is clear that in an island whose 
brightest source has an SNR of A/a, sources below an 



SNR of F{xo)/a will not be detected by an iterative fitting 
method. The fitting residual exceeds 5a at an SNR as low 
as 11. Therefore, even an ideal Gaussian fitting routine will 
miss 5a sources that are within the same island as a source 
of ^ llcr if an iterative approach is taken. If two Gaussian 
components are fit to an island of pixels such as that shown 
in Figure 1101 and the positions are left unconstrained, the 
fainter component will migrate towards one of the maxima 
in the fitting residual. The brighter source will then be char- 
acterised by two Gaussians, and the fainter source by none. 
The final result, is that neither of the sources will be well 
characterised. It is therefore essential that a source finding 
algorithm has some method for determining the number of 
Gaussian components within an island, as well as a way 
to stop the fitting process from mis-characterising the two 
sources. A process called sectioning or de-blending is a com- 
mon method. 

6.2.2 Sectioning or de-blending 

A second approach to characterising islands with multiple 
sources is to use the distribution of fiux within the island 
to determine the number of components to be fit, and then 
fit the components. This approach relies on some a priori 
knowledge of what a source looks like to break an island 
into components, sfind, SE, and Aegean all use a form of 
sectioning to generate an initial estimate of the number of 
sources to be fit, as well as the starting parameters. 

It is possible to create a statistical measure that will 
account for the number of sources that are missed because 
there are multiple sources within an island of pixels. This 
would, however, require detailed knowledge of the source 
finding algorithm, the flux distribution of the source pop- 
ulation, and the flux dependent two-point correlation func- 
tion. The complexity of this calculation means that it is 
never computed and sometimes not even considered. Since 
many variable phenomena appear in or near known sources 
(eg., radio supernovae in galaxies, extreme scattering events 
within our own Galaxy, and more), an inability to accurately 
characterise this population of sources will make it difficult 
or impossible to reliably detect and characterise many vari- 
able events. 



7 THE NEW SOURCE FINDING PROGRAM: 

Aegean 

With an understanding of how the underlying algorithms 
affect a source finder's ability to find and accurately charac- 
terise islands of pixels, we have created a new source finding 
algorithm. The goal of the new algorithm is to incorporate 
the reliability and completeness performance of the pack- 
ages studied in § I3I6I whilst improving on their ability to 
characterise islands of pixels. The source finding algorithm 
is called Aegean, as it deals with many islands. 

As background estimation and subtraction are not part 
of the focus of this work, Aegean has been designed with 
only a simple background estimation algorithm. For the 
analysis presented, Aegean was run with a detection thresh- 
old of 125/iJy/beam. Aegean uses the FloodFill algorithm 
described in § 13.51 to create islands of pixels. 

Aegean makes use of the notion of a single curvature 
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map to characterise an island of pixels. The curvature k of 
a function f{x) is given by: 



Curvature Map 



(l + ,f 

l|Reillvlll98d '). For a Gaussian with a FHWM of k pixels, 
-16a; In 2 _x£8in2 



(10) 



fc2 



(11) 



(12) 



so that /' has a maxima at a; = fc/\/2, and 

,2 1 (In 2)^ 
fc2 29 • 

.2 

For a Gaussian with k ^ 1, f <C 1 and we can approximate 

KC^f". (13) 

The curvature of a surface in a particular direction can be 
defined using Eauation ll3l where the differentiation i s along 
a unit vector in the chosen direction. iMolinarl et al.l (I2OI1I ) 
calculate the curvature of their input image in four image 
directions in order assist their source finding and charac- 
terisation. We combine these four curvature measurements 
to calculate the mean curvature of an image. For an image 
convolved with a Gaussian with a FWHM of k pixels, the 
(mean) curvature, k is equal to the mean o f k calculated 
in any two orthogonal directions (|Reillvll 19821 ). The discrete 
2D Laplacian kernel 



1 



(14) 



calculates the sum of the second derivatives in four direc- 
tions. Convolving the input image with L'^y will therefore 
produce a map of 2k - a single curvature map. 

Islands of pixels are fit with multiple Gaussian compo- 
nents. The number of components to be fit is determined 
from a curvature map. The curvature map will be nega- 
tive around local maxima. Groups of contiguous pixels that 
have negative curvature and fluxes above the threshold are 
called summits. An island of pixels will contain one or more 
summits. Aegean fits one component per summit, with the 
parameters of each of the components are taken from the 
corresponding summit. The position and flux are initially set 
to be equal to the brightest pixel within a summit, and the 
shape parameters (major/minor axis and position angle) are 
set to be the same as the convolving beam. Figure [T^ shows 
an example of two islands that contain multiple sources with 
the island boundaries and regions of negative curvature de- 
limited. In the example in the left panel of Figure [121 there 
are three regions of negative curvature that are completely 
within the green island. This island is fit with three Gaus- 
sians. In the example in the right panel of Figure [121 there 
are two regions of negative curvature that overlap with the 
island of pixels. One component is contained entirely within 
the island, whilst the other is only partly within the island. 
Only the region of negative curvature that is within the 
green island is considered when estimating the initial pa- 
rameters of the components. Both of the islands depicted 
in Figure [12] contains a source that is bright enough that 
the expected fitting residual would be brighter than any of 
the other components within the island, and therefore an 
iterative fitting approach would only fit a single component 




A curvature map is 
created from the input 
image. The curvature 
noise {o^^^J is caiculated. 
The image noise (a^^^) is 
set by the user. 
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The image is broken into disjoint 
Islands using the FloodFill 
algorithm. Islands are seeded 
with pixels above 0^= 5a^^. 
FloodFill includes surrounding 
pixels with o >4a . 



A clipping mask and curvature 
mask are created for each island. 
The curvature mask contains all 
pixels with a curvature < -o^^^^^. 
The clipping mask contains all 
pixels with an intensity > 5a . 
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Applying the curvature and 
clipping masks breaks the island 
into summits. These summits are 
used to determine the number of 
components to be fit, as well as 
the initial parameters for each of 
these components. 




Each of the Gaussian components 
are fit jointly with appropriate 
constraints that ensure the fits wil 
converge to an acceptable 
solution. Red ellipses show the 
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Figure 11. A demonstration of the operation of Aegean, using 
a single multiple component island as example. The input image 
is used to create a curvature map from which the curvature noise 
cTcurve is Calculated. FloodFill is used to break the image into 
islands of pixels. The number of components in an island is esti- 
mated using a combination of the curvature mask and threshold 
mask. An elliptical Gaussian is fit to each of the components in 
the island simultaneously. 



(see § 16.2. ip . Since the island of pixels in the right panel of 
Figure [12] has two summits, Aegean is able to accurately 
detect and characterise both components. Islands of pixels 
that contain only a single source have only a single summit 
and are fit with a single component. 

To avoid faint components migrating to the fitting resid- 
ual of brighter components, the position of each of the com- 
ponents is constrained to be within the corresponding sum- 
mit. The flux of each component must be greater than 5cr. 
For low SNR sources, the true flux can be significantly differ- 
ent from the intensity of the brightest pixel in the summit, 
Smax- For high SNR sources such noise variations are less 
important and beam sampling effects become more impor- 
tant. 

For an image with a sampling rate of k pixels per beam 
a source of flux S which is located at the intersection of 
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Figure 12. Two examples of the curvature analysis scheme. The 
grayscale represents the flux density map and ranges from —3(7 
(white) to +10(T (black). The green contour is at 5<j and represents 
the island boundary. The red contours are where the curvature 
map changes from positive to negative. Regions surrounded by a 
red contour have negative curvature and are the local maxima. 



four pixels will effectively be sampled \/2 * 9 /2k pixels from 
the centre of the source. The intensity of the peak pixel is 
therefore given by: 

/ 
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(15) 



= 5-2"^, (16) 

where 6 is the FWHM of the source. The flux of each compo- 

2 

nent is therefore constrained to be less than Smax • 2 *^ + So". 

A Gaussian function has negative curvature from the 
peak out to ±FWHM/x/2. The size of a summit is there- 
fore used to constrain the component size. The major and 
minor axes of a component must be larger than the syn- 
thesised beam, and must remain smaller than y/2 times the 
width of the summit. Beam sampling effects again play a 
role here, and so in Aegean, we increase the limits on the 
major and minor axes each by two pixels to account for this. 
If the summit is smaller than the synthesised beam then the 
component is fit with the PSF. 

The performance of Aegean has been presented in ii l5l6l 
along with the other source finding algorithms under study. 



Aegean makes use of a curvature image which is de- 
rived from the input image with a Laplacian transform. Us- 
ing the curvature image Aegean is able to accurately deter- 
mine the number of compact components within an island 
of pixels and produce a set of initial parameters and limits 
for a constrained fit of multiple elliptical Gaussians. 

Aegean has been shown to produce catalogues with a 
5cr completeness that is better than our estimation of an 
ideal source finder. This completeness has been achieved 
without sacrificing reliability, and Aegean is the most reli- 
able of the tested algorithms. The next generation of radio 
surveys will be sensitive enough that ~ 5% of the islands in 
the image will contain multiple components and therefore 
the ability to characterise such islands is of critical impor- 
tance. Aegean is able to accurately characterise islands of 
pixels which contain multiple compact components. 

We have shown that in order to improve the reliabil- 
ity and completeness of source catalogues it is necessary to 
perform constrained multiple Gaussian fitting. An accurate 
estimation of initial parameters and sensible constraints are 
both critical when multiple component Gaussian fitting is 
performed. We have demonstrated a method for estimating 
and constraining the fitting parameters which is based on 
the curvature of the image. We anticipate that by adopting 
the Aegean algorithm, the next generation of radio contin- 
uum surveys will be able to achieve more complete, reliable 
and accurate catalogues without relying on significant man- 
ual intervention. 
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8 CONCLUSIONS 

Using a simulated data set, we have assessed the perfor- 
mance of some widely used source finding packages, along 
with the ASKAPsoft source finding program Selavy. These 
source finding packages are found to produce complete and 
reliable catalogues of isolated compact sources. We iden- 
tify two populations of sources that are not well detected 
by the source finding packages. The first population being 
faint sources close to the detection limit, and the second be- 
ing sources which are within an island of pixels containing 
multiple components. Islands of pixels with multiple compo- 
nents are found to be poorly characterised by source finding 
packages that take an iterative fitting approach to character- 
isation. Source finding packages that estimate the number 
of components in an island prior to fitting are less likely to 
mis-characterise the island. We have developed a new source 
finding package, Aegean, which is able to characterise the 
number of components within an island of pixels more ac- 
curately than any of the other packages tested. 



REFERENCES 

Adams, T. J., Bunton, J. D., & Kesteven, M. J. 2004, ExA, 
17, 279 

Bannister, K. W., Murphy, T., Gaensler, B. M., Hunstead, 
R. W., & Chatterjee, S. 2011, MNRAS, 412, 634 

Becker, R. H., White, R. L., & Helfand, D. J. 1995, ApJ, 
450, 559 

Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393 
Chatterjee, S., Murphy, T., & VAST Collaboration. 2010, 
in Bulletin of the American Astronomical Society, Vol. 42, 
American Astronomical Society Meeting Abstracts #215, 
#470.12-f 
Condon, J. J. 1997, PASP, 109, 166 

Condon, J. J., Cotton, W. D., Greisen, E. W., Yin, Q. F., 
Perley, R. A., Taylor, G. B., & Broderick, J. J. 1998, As- 
tronomical Journal, 115, 1693 

Croft, S., Bower, G. C, Keating, G., Law, C, Whysong, 
D., Williams, P. K. G., & Wright, M. 2011, ApJ, 731, 34 

Eddington, A. S. 1913, MNRAS, 73 



14 Hancock et al. 



Hopkins, A. M., Miller, C. J., Connolly, A. J., Genovese, 
C, Nichol, R. C, & Wasserman, L. 2002, AJ, 123, 1086 

Huynh, M. T., Hopkins, A. M., Norris, R. P., Hancock, 
R J., Murphy, T., & Jurek, R. 2011, RASA (in press) 

Johnston, S., et al. 2008, ExA, 22, 151 

Lonsdale, C. J., et al. 2009, Rroceedings of the IEEE, 97, 
1497 

Mauch, T., Murphy, T., Buttery, H. J., Curran, J., Hun- 
stead, R. W., Piestrzynski, B., Robertson, J. G., & Sadler, 
E. M. 2003, MNRAS, 342, 1117 

Molinari, S., Schisano, E., Faustini, F., Pestalozzi, M., Di 
Giorgio, A. M., & Liu, S. 2011, Astronomy & Astro- 
physics. .530. A133 

Murphy, T., Mauch, T., Green, A., Hunstead, R. W., 
Picstrzynska, B., Kels, A. P., & Sztajer, P. 2007, MNRAS, 
382, 382 

Norris, R. P., et al. 2011, PAS A, 28, 215 

Reilly, R. C. 1982, The American Mathematical Monthly, 

89, 180 

Roerdink, J., & Meijster, A. 2001, Fundamenta Informati- 
cae, 41, 187 

Sault, R. J., Teuben, P. J., & Wright, M. C. H. 1995, 

in Astronomical Society of the Pacific Conference Scries, 
Vol. 77, Astrnonomical Data Analysis Software and Sys- 
tems IV, ed. R. Shaw, H. Payne, & J. Hayes, 433 



